Extending WordNet with Hypernyms and Siblings Acquired from Wikipedia
نویسندگان
چکیده
This paper proposes a method for extending WordNet with terms in Wikipedia. Our method identifies a WordNet synset by integrating evidence derived from the structure of an article in Wikipedia and distributional similarity of terms. Unlike previous methods, utilizing the hypernym and siblings of the target term acquired from Wikipedia, the proposed method can deal with terms other than Wikipedia article titles and can work well even when reliable distributional similarity of a target term is unavailable. Experiments show that the proposed method can identify synsets for 2,039,417 inputs at precision rate of 84%. Furthermore, it is estimated from the experimental results that there should be 328,572 terms among all the inputs whose synset our method can correctly identify, while previous methods relying only on distributional similarity and lexico-syntactic patterns cannot.
منابع مشابه
Linking Dutch Wikipedia Categories to EuroWordNet
Wikipedia provides category information for a large number of named entities but the category structure of Wikipedia is associative, and not always suitable for linguistic applications. For this reason, a merger of Wikipedia andWordNet has been proposed. In this paper, we address the word sense disambiguation problem that needs to be solved when linking Dutch Wikipedia categories to polysemous ...
متن کاملQuery Refinement and User Relevance Feedback for Contextualized Image Retrieval
The motivation of this paper is to increase the user perceived precision of results of Content Based Information Retrieval (CBIR) systems with Query Refinement (QR), Visual Analysis (VA) and Relevance Feedback (RF) algorithms. The proposed algorithms were implemented as modules into K-Space CBIR system. The QR module discovers hypernyms for the given query from a free text corpus (Wikipedia) an...
متن کاملGeo-WordNet: Automatic Georeferencing of WordNet
WordNet has been used extensively as a resource for the Word Sense Disambiguation (WSD) task, both as a sense inventory and a repository of semantic relationships. Recently, we investigated the possibility to use it as a resource for the Geographical Information Retrieval task, more specifically for the toponym disambiguation task, which could be considered a specialization of WSD. We found tha...
متن کاملUsing Wikipedia for Hierarchical Finer Categorization
Wikipedia is one of the largest growing structured resources on the Web and can be used as a training corpus in natural language processing applications. In this work, we present a method to categorize named entities under the hierarchical fine-grained categories provided by the Wikipedia taxonomy. Such a categorization can be further used to extract semantic relations among these named entitie...
متن کاملAutomatically Extending NE coverage of Arabic WordNet using Wikipedia
This paper focuses on the automatic extraction of Arabic Named Entities (NEs) from the Arabic Wikipedia (AWP), their automatic attachment to Arabic WordNet (AWN) and their automatic link to Princeton's English WordNet (PWN). We briefly report on the current status of AWN, focusing on its rather limited NE coverage. Our proposal of automatic extension is then presented, applied and evaluated. Ke...
متن کامل